{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TokenCountingHandler - Demo Usage\n",
    "\n",
    "This notebook walks through how to use the TokenCountingHandler and how it can be used to track your prompt, completion, and embedding token usage over time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/lmarkewi/miniconda3/envs/llama_index/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import tiktoken\n",
    "from llama_index.llms import OpenAI\n",
    "\n",
    "from llama_index import (\n",
    "    SimpleDirectoryReader,\n",
    "    VectorStoreIndex,\n",
    "    ServiceContext,\n",
    "    set_global_service_context,\n",
    ")\n",
    "from llama_index.callbacks import CallbackManager, TokenCountingHandler"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Here, we setup the callback and the serivce context. We set a global service context so that we don't have to worry about passing it into indexes and queries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "token_counter = TokenCountingHandler(\n",
    "    tokenizer=tiktoken.encoding_for_model(\"gpt-3.5-turbo\").encode\n",
    ")\n",
    "\n",
    "callback_manager = CallbackManager([token_counter])\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
    "\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    llm=llm, callback_manager=callback_manager\n",
    ")\n",
    "\n",
    "# set the global default!\n",
    "set_global_service_context(service_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Token Counting\n",
    "\n",
    "The token counter will track embedding, prompt, and completion token usage. The token counts are __cummulative__ and are only reset when you choose to do so, with `token_counter.reset_counts()`.\n",
    "\n",
    "### Embedding Token Usage\n",
    "\n",
    "Now that the service context is setup, let's track our embedding token usage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"../data/paul_graham\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "16830\n"
     ]
    }
   ],
   "source": [
    "print(token_counter.total_embedding_token_count)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That looks right! Before we go any further, lets reset the counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "token_counter.reset_counts()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LLM + Embedding Token Usage\n",
    "\n",
    "Next, let's test a query and see what the counts look like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(similarity_top_k=4)\n",
    "respons = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding Tokens:  8 \n",
      " LLM Prompt Tokens:  3969 \n",
      " LLM Completion Tokens:  124 \n",
      " Total LLM Token Count:  4093 \n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(\n",
    "    \"Embedding Tokens: \",\n",
    "    token_counter.total_embedding_token_count,\n",
    "    \"\\n\",\n",
    "    \"LLM Prompt Tokens: \",\n",
    "    token_counter.prompt_llm_token_count,\n",
    "    \"\\n\",\n",
    "    \"LLM Completion Tokens: \",\n",
    "    token_counter.completion_llm_token_count,\n",
    "    \"\\n\",\n",
    "    \"Total LLM Token Count: \",\n",
    "    token_counter.total_llm_token_count,\n",
    "    \"\\n\",\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, LlamaIndex used 8 token to embed the query text, sent 3987 prompt tokens to the LLM, and received 149 completion tokens from the LLM."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Usage\n",
    "\n",
    "The token counter tracks each token usage event in an object called a `TokenCountingEvent`. This object has the following attributes:\n",
    "\n",
    "- prompt -> The prompt string sent to the LLM or Embedding model\n",
    "- prompt_token_count -> The token count of the LLM prompt\n",
    "- completion -> The string completion received from the LLM (not used for embeddings)\n",
    "- completion_token_count -> The token count of the LLM completion (not used for embeddings)\n",
    "- total_token_count -> The total prompt + completion tokens for the event\n",
    "- event_id -> A string ID for the event, which aligns with other callback handlers\n",
    "\n",
    "These events are tracked on the token counter in two lists:\n",
    "\n",
    "- llm_token_counts\n",
    "- embedding_token_counts\n",
    "\n",
    "Let's explore what these look like!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num LLM token count events:  2\n",
      "Num Embedding token count events:  1\n"
     ]
    }
   ],
   "source": [
    "print(\"Num LLM token count events: \", len(token_counter.llm_token_counts))\n",
    "print(\"Num Embedding token count events: \", len(token_counter.embedding_token_counts))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This makes sense! The previous query embedded the query text, and then made 2 LLM calls (since the top k was 4, and the default chunk size is 1024, two seperate calls need to be made so the LLM can read all the retrieved text).\n",
    "\n",
    "Next, let's quickly see what these events look like for a single event."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prompt:  Context information is below. \n",
      "---------------------\n",
      "I could write essays again, I wrote a bunch abo ...\n",
      "\n",
      "prompt token count:  3261 \n",
      "\n",
      "completion:  The author wrote short stories and programmed on an IBM 1401 in 9th grade, and later on a TRS-80 mic ...\n",
      "\n",
      "completion token count:  32 \n",
      "\n",
      "total token count 3293\n"
     ]
    }
   ],
   "source": [
    "print(\"prompt: \", token_counter.llm_token_counts[0].prompt[:100], \"...\\n\")\n",
    "print(\n",
    "    \"prompt token count: \", token_counter.llm_token_counts[0].prompt_token_count, \"\\n\"\n",
    ")\n",
    "\n",
    "print(\"completion: \", token_counter.llm_token_counts[0].completion[:100], \"...\\n\")\n",
    "print(\n",
    "    \"completion token count: \",\n",
    "    token_counter.llm_token_counts[0].completion_token_count,\n",
    "    \"\\n\",\n",
    ")\n",
    "\n",
    "print(\"total token count\", token_counter.llm_token_counts[0].total_token_count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index",
   "language": "python",
   "name": "llama_index"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
